Estimating the Gumbel Scale Parameter for Local Alignment of Random Sequences by Importance Sampling with Stopping Times.
نویسندگان
چکیده
The gapped local alignment score of two random sequences follows a Gumbel distribution. If computers could estimate the parameters of the Gumbel distribution within one second, the use of arbitrary alignment scoring schemes could increase the sensitivity of searching biological sequence databases over the web. Accordingly, this article gives a novel equation for the scale parameter of the relevant Gumbel distribution. We speculate that the equation is exact, although present numerical evidence is limited. The equation involves ascending ladder variates in the global alignment of random sequences. In global alignment simulations, the ladder variates yield stopping times specifying random sequence lengths. Because of the random lengths, and because our trial distribution for importance sampling occurs on a different sample space from our target distribution, our study led to a mapping theorem, which led naturally in turn to an efficient dynamic programming algorithm for the importance sampling weights. Numerical studies using several popular alignment scoring schemes then examined the efficiency and accuracy of the resulting simulations.
منابع مشابه
The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment
The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter lambda and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the sim...
متن کاملSequential-Based Approach for Estimating the Stress-Strength Reliability Parameter for Exponential Distribution
In this paper, two-stage and purely sequential estimation procedures are considered to construct fixed-width confidence intervals for the reliability parameter under the stress-strength model when the stress and strength are independent exponential random variables with different scale parameters. The exact distribution of the stopping rule under the purely sequential procedure is approximated ...
متن کاملScore distributions of gapped multiple sequence alignments down to the low-probability tail.
Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution ...
متن کاملStatistical Significance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models
The score statistics of probabilistic gapped local alignment of random sequences is investigated both analytically and numerically. The full probabilistic algorithm (e.g., the "local" version of maximum-likelihood or hidden Markov model method) is found to have anomalous statistics. A modified "semi-probabilistic" alignment consisting of a hybrid of Smith-Waterman and probabilistic alignment is...
متن کاملLocal Field Correction Effect on Dicluster Stopping Power in a Strongly Coupled Two-Dimensional Electron Gas System
We calculate the stopping power for heavy-ion diclusters moving in a strongly coupled two-dimensional electron gas system by using the local field corrected dielectric function at finite temperature. We obtain a parameterized local field correction factor based on a relation between the thermal compressibility and exchange-correlation energy in two-dimension. The interpolated parameter is deriv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Annals of statistics
دوره 37 6A شماره
صفحات -
تاریخ انتشار 2009